This guide covers the most common mistakes in econometric analysis and how to avoid them. Each mistake includes:
Severity Levels:
Interpreting regression coefficients as causal effects when identification assumptions aren't met. This is the most fundamental error in applied econometrics.
"The regression shows that education causes a 7.8% increase in income."
Problem: Ignores ability bias, family background, and selection.
"Education is associated with 7.8% higher income. If we could eliminate ability bias and other confounders, this might represent a causal effect."
Incorrectly interpreting coefficients in log-linear regressions, especially for large effects or binary variables.
| Coefficient | ❌ Wrong Interpretation | ✅ Correct Interpretation |
|---|---|---|
| β = 0.08 | 8% increase | 8 percentage point increase in log(Y), ≈ 8.3% increase in Y |
| β = 0.5 (binary X) | 50% increase | 65% increase: (e^0.5 - 1) × 100 = 64.9% |
| β = -0.2 | 20% decrease | 18% decrease: (e^-0.2 - 1) × 100 = -18.1% |
Focusing only on p-values without considering economic magnitude or practical importance.
Controlling for variables that are outcomes of the treatment (bad controls) or that create collider bias.
| Control Type | When to Include | When NOT to Include |
|---|---|---|
| Pre-treatment variables | Age, baseline education, family background | Never exclude pre-treatment confounders |
| Post-treatment outcomes | NEVER - these are bad controls | Income when studying effect of education |
| Mediators | Only in mediation analysis | Job search when studying training effects |
| Colliders | NEVER - creates bias | Selection into sample based on treatment |
Testing many specifications until finding significant results, then reporting only the "best" model without acknowledging the search process.
Running regressions without checking whether key assumptions are satisfied, leading to invalid inference.
Using instruments that are only weakly correlated with the endogenous variable, leading to biased and imprecise estimates.
| First Stage F-stat | Interpretation | Action Needed |
|---|---|---|
| F > 104 | Strong instrument (< 5% bias) | Proceed with IV |
| 10 < F < 104 | Weak instrument (5-10% bias) | Report weak IV robust tests |
| F < 10 | Very weak instrument | Find better instrument or abandon IV |
Using inappropriate standard errors that don't account for data structure, leading to wrong statistical inference.
| Data Structure | Standard Error Type | Stata Command | R Command |
|---|---|---|---|
| Heteroskedastic errors | Robust (Huber-White) | reg y x, robust | lm_robust(y ~ x, se_type = "HC1") |
| Clustered data | Cluster-robust | reg y x, cluster(id) | lm_robust(y ~ x, clusters = id) |
| Panel data | Panel-robust | xtreg y x, fe robust | feols(y ~ x | id, vcov = "hetero") |
| Survey data | Survey weights | svy: reg y x | svyglm(y ~ x, design = survey_design) |
Assuming parallel trends without testing, or proceeding with DiD when pre-trends are clearly different.
Choosing bandwidth to get desired results rather than using data-driven optimal selection.
Regression tables and figures that are hard to read, poorly labeled, or missing essential information.
Not acknowledging limitations or discussing what could go wrong with the analysis.
Before submitting any econometric analysis, ask yourself:
If you can't answer "yes" to all these questions, keep working on your analysis.
Common Mistakes & Diagnostics Guide • ImpactMojo 101 Knowledge Series
Licensed under CC BY-NC-SA 4.0 • Free to use with attribution • www.impactmojo.in
Remember: Good econometrics is about being honest with the data and transparent about limitations. When in doubt, be conservative in your claims.